44 research outputs found

    Bipartite graph for topic extraction

    Get PDF
    This article presents a bipartite graph propagation method to be applied to different tasks in the machine learning unsupervised domain, such as topic extraction and clustering. We introduce the objectives and hypothesis that motivate the use of graph based method, and we give the intuition of the proposed Bipartite Graph Propagation Algorithm. The contribution of this study is the development of new method that allows the use of heuristic knowledge to discover topics in textual data easier than it is possible in the traditional mathematical formalism based on Latent Dirichlet Allocation (LDA). Initial experiments demonstrate that our Bipartite Graph Propagation algorithm return good results in a static context (offline algorithm). Now, our research is focusing on big amount of data and dynamic context (online algorithm).São Paulo Research Foundation (FAPESP) (proj. number 2011/23689-9

    A comparison of the effect of feature selection and balancing strategies upon the sentiment classification of portuguese news stories

    Get PDF
    Sentiment classification of news stories using supervised learning is a mature task in the field of Natural Language Processing. Supervised learning strategies rely upon training data to induce a classifier. Training data can be imbalanced, with typically the neutral class being the majority class. This imbalance can bias the induced classifier towards the majority class. Balancing and feature selection can mitigate the effects of imbalanced data. This paper surveys a number of common balancing and\ud feature selections techniques, and applies them to an imbalanced data set of manually labelled Brazilian agricultural news stories. The strategies were appraised with a 90:10 holdout evaluation and compared with a baseline strategy. We found that: 1. the feature selection strategies provided no identifiable advantage over a baseline method and 2. balancing produced an advantage over baseline with random oversampling producing the best results.FAPESP (grant 11/20451-1

    Graph construction based on labeled instances for semi-supervised learning

    Get PDF
    Semi-Supervised Learning (SSL) techniques have become very relevant since they require a small set of labeled data. In this context, graph-based algorithms have gained prominence in the area due to their capacity to exploiting, besides information about data points, the relationships among them. Moreover, data represented in graphs allow the use of collective inference (vertices can affect each other), propagation of labels (autocorrelation among neighbors) and use of neighborhood characteristics of a vertex. An important step in graph-based SSL methods is the conversion of tabular data into a weighted graph. The graph construction has a key role in the quality of the classification in graph-based methods. This paper explores a method for graph construction that uses available labeled data. We provide extensive experiments showing the proposed method has many advantages: good classification accuracy, quadratic time complexity, no sensitivity to the parameter k > 10, sparse graph formation with average degree around 2 and hub formation from the labeled points, which facilitates the propagation of labels.Sao Paulo Research Foundation (FAPESP) (Grant 2011/21880-3 and 2011/22749-8

    Causation generalization through the identification of equivalent nodes in causal sparse graphs constructed from text using node similarity strategies

    Get PDF
    Causal Bayesian Graphs can be constructed from causal information in text. These graphs can be sparse because the cause or effect event can be expressed in various ways to represent the same information. This sparseness can corrupt inferences made on the graph. This paper proposes to reduce sparseness by merging: equivalent nodes and their edges. This paper presents a number of experiments that evaluates the applicability of node similarity techniques to detect equivalent nodes. The experiments found that techniques that rely upon combination of node contents and structural information are the most accurate strategies, specifically we have employed: 1. node name similarity and 2. combination of node name similarity and common neighbours (SMCN). In addition, the SMCN returns ”better” equivalent nodes than the string matching strategy.São Paulo Research Foundation (FAPESP) (grants 2013/12191-5, 2011/22749-8 and 2011/20451-1

    Exploiting Social and Mobility Patterns for Friendship Prediction in Location-Based Social Networks

    Get PDF
    International audienceLink prediction is a " hot topic " in network analysis and has been largely used for friendship recommendation in social networks. With the increased use of location-based services, it is possible to improve the accuracy of link prediction methods by using the mobility of users. The majority of the link prediction methods focus on the importance of location for their visitors, disregarding the strength of relationships existing between these visitors. We, therefore, propose three new methods for friendship prediction by combining, efficiently, social and mobility patterns of users in location-based social networks (LBSNs). Experiments conducted on real-world datasets demonstrate that our proposals achieve a competitive performance with methods from the literature and, in most of the cases, outperform them. Moreover, our proposals use less computational resources by reducing considerably the number of irrelevant predictions, making the link prediction task more efficient and applicable for real world applications

    Causation Generalization Through the Identification of Equivalent Nodes in Causal Sparse Graphs Constructed from Text using Node Similarity Strategies

    Get PDF
    Causal Bayesian Graphs can be constructed from causal information in text. These graphs can be sparse because the cause or effect event can be expressed in various ways to represent the same information. This sparseness can corrupt inferences made on the graph. This paper proposes to reduce sparseness by merging: equivalent nodes and their edges. This paper presents a number of experiments that evaluates the applicability of node similarity techniques to detect equivalent nodes. The experiments found that techniques that rely upon combination of node contents and structural information are the most accurate strategies, specifically we have employed: 1. node name similarity and 2. combination of node name similarity and common neighbours (SMCN). In addition, the SMCN returns "better" equivalent nodes than the string matching strategy

    Influence maximization based on the least influential spreaders

    Get PDF
    The emergence of social media increases the need for the recognization of social influence mainly motivated by online advertising, political and health campaigns, recommendation systems, epidemiological study, etc. In spreading processes, it is possible to define the most central or influential vertices according to the network topology and dynamic. On the other hand, the least influential spreaders have been disregarded. This paper aims to maximize the mean of information propagation on the network by recognizing the non influential individuals by making them better spreader. Experimental results confirm that selecting 0.5% of least influential spreaders in three social networks (google+, hamsterster and advogato) and rewiring one connection to some important vertex, increase the propagation over the entire network.National Council for Scientific and Technological Development (CNPq) (grant: 140688/2013-7)Sao Paulo Research Foundation (FAPESP) (grant: 2011/21880-3

    Link prediction in graph construction for supervised and semi-supervised learning

    Get PDF
    Many real-world domains are relational in nature since they consist of a set of objects related to each other in complex ways. However, there are also flat data sets and if we want to apply graph-based algorithms, it is necessary to construct a graph from this data. This paper aims to: i) increase the exploration of graph-based algorithms and ii) proposes new techniques for graph construction from flat data. Our proposal focuses on constructing graphs using link prediction measures for predicting the existence of links between entities from an initial graph. Starting from a basic graph structure such as a minimum spanning tree, we apply a link prediction measure to add new edges in the graph. The link prediction measures considered here are based on structural similarity of the graph that improves the graph connectivity. We evaluate our proposal for graph construction in supervised and semi-supervised classification and we confirm the graphs achieve better accuracy.São Paulo Research Foundation (FAPESP) (grants: 2013/12191-5, 2011/21880-3 and 2011/22749-8

    Lexical resources for the identification of causative relations in portuguese texts

    Get PDF
    The identification of causal relations from text is a mature problem in Natural Language Processing. There are a number of resources and tools to aid causative relation extraction in English, but there seems to be a limited number of resources for Portuguese. This paper presents a number of resources which are designed to aid the researcher and the practitioner to extract causative relations from Portuguese texts.FAPESP (grant number: 11/20451-1
    corecore